Using Cost-Sensitive Learning to Determine Gene Conversions

نویسندگان

  • Mark J. Lawson
  • Lenwood S. Heath
  • Naren Ramakrishnan
  • Liqing Zhang
چکیده

Gene conversion, a non-reciprocal transfer of genetic information from one sequence to another, is a biological process whose importance in affecting both short-term and long-term evolution cannot be overemphasized. Knowing where gene conversion has occurred gives us important insights into gene duplication and evolution in general. In this paper we present an ensemble-based learning method for predicting gene conversions using two different models of reticulate evolution. Since detecting gene conversion is a rare-class problem, we implement costsensitive learning in the form of a generated cost matrix that is used to modify various underlying classifiers. Results show that our method combines the predictive power of different models and is able to predict gene conversion more accurately than any of the two studied models. Our work provides a useful framwork for future improvement of gene conversion predictions through multiple models of gene conversion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing a Cost Matrix to Solve Rare-Class Biological Problems

In a binary dataset, a rare-class problem occurs when one class of data (typically the class of interest) is far outweighed by the other. Such a problem is typically difficult to learn and classify and is quite common, especially among biological problems such as the identification of gene conversions. A multitude of solutions for this problem exist with varying levels of success. In this paper...

متن کامل

A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate

Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...

متن کامل

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

The Comparison of the Effectiveness of a Modified Conformation Sensitive Gel Electrophoresis with Denaturing High Performance Liquid Chromatography

Background: Several methods have been developed for detection of sequence variation in genes and each has its advantages and disadvantages. A disadvantage of them is that the simpler, cost-effective methods are commonly perceived as being less sensitive in their detection of sequence variation, whereas those with proven sensitivity have a requirement for complex or expensive laboratory equipmen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008